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An interleaver is a critical component for the channel coding performance of turbo codes. Algebraic 
constructions are of particular interest because they admit analytical designs and simple, practical 
hardware implementation. Contention-free interleavers have been recently shown to be suitable for 
parallel decoding of turbo codes. In this correspondence, it is shown that permutation polynomials 
generate maximum contention-free interleavers, i.e., every factor of the interleaver length becomes a 
possible degree of parallel processing of the decoder. Further, it is shown by computer simulations 
that turbo codes using these interleavers perform very well for the 3rd Generation Partnership Project 
(3GPP) standard. 

Index Terms 

Turbo code, interleaver, permutation polynomial, contention-free, algebraic, quadratic, parallel pro- 
cessing. 
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I. Introduction 

Interleavers for turbo codes [1]— [12] have been extensively investigated. Recently, Sun and 
Takeshita [1] suggested the use of permutation polynomial-based interleavers over integer rings. 
In particular, quadratic polynomials were emphasized; this quadratic construction is markedly 
different from and superior to the one proposed earlier by Takeshita and Costello [7] 1 in turbo 
coding applications. The algebraic approach in [1] was shown to admit analytical design of 
an interleaver matched to the constituent convolutional codes. The resulting performance was 
shown to be better than S-random interleavers [11] for relatively short block lengths and parallel 
concatenated turbo codes; we show in this correspondence that even for moderate block lengths 
(4096 information bits) an excellent performance can be obtained. An iterative turbo decoder 
needs both an interleaver and a deinterleaver. Ryu and Takeshita have also shown a necessary 
and sufficient condition for a quadratic permutation polynomial to admit a quadratic inverse [13]. 
Moreover, the simplicity of the algebraic construction in [1] implies efficient implementations 
as one witnesses in [14]. 

The decoding of turbo codes is performed by an iterative process in which the so-called 
extrinsic information is exchanged between sub-blocks 2 of the iterative decoder. The parallel 
processing of iterative decoding of turbo codes is of interest for high-speed decoders. Aspects 
of implementations of parallel decoders in chips and expected performance are studied in [15]. 
Interleaving of extrinsic information is one important aspect to be addressed in parallel decoders 
because a memory access contention, as explained in this section, may appear during the 
exchange of extrinsic information between the sub-blocks of the iterative decoder [5]. The first 
approaches to solve the memory access contention problem simply avoided it by constraining 
the interleavers to be contention-free as in [2], [3], [5], [16]. For these type of constrained 
constructions of interleavers, Nimbalker et al. have shown that only a very small fraction of all 
interleavers are suitable for parallel processing of iterative decoding [2]. They have also proposed 
a new construction of a modified dithered relatively prime 3 interleaver [8] (DRP) interleaver. If 

'The construction in [7] generates interleavers typically with the performance and statistics of a randomly generated interleaver 
but with the advantage of a very simple generation. 

2 There are typically two or more sub-blocks in an iterative turbo decoder, each implementing a soft-input soft-output decoding 
algorithm of a convolutional code. 

3 The DRP interleaver construction is one of the best known for turbo codes with excellent error rate performance. 
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the interleaver is required to be left unconstrained (e.g., the interleaver cannot be modified 
because it is already part of a standard), then the memory contention problem can still be solved 
as shown in [17], [18] but at a cost of additional complexity. 

In this correspondence, we approach the memory contention problem using constrained inter- 
leavers that are contention-free. The advantages of our approach are its low complexity induced 
from an algebraic solution but with no apparent error rate performance degradations against any 
good interleavers. The contention-free condition is illustrated in Fig.[T]through an arbitrary device 
(not necessarily a turbo decoder). The device has two sub-blocks. Each of the N = 16 cells in 
sub-block needs to fetch data in a one-to-one fashion from the N = 16 cells in sub-block 1. 
If sub-block processes data in a serial fashion, and its cells fetch data sequentially from left 
to right (x = 0,x = l,...x = 15) then the sequence (/(0) = 0,/(l) = 7, . . . , /(15) = 13) 
indicates the addresses of the cells in sub-block 1 from which data is extracted. The function 
f(x) describes the interleaver. If sub-block processes data in a parallel fashion using M = 4 
processors, then sub-block is split in windows of size W = 4. A cell in a window has an 
offset value < j < W (different values of offsets are shown in different shades). Each of 
the four processors fetches data simultaneously always at a particular offset j. The contention- 
free property requires that for a fixed offset, exactly one cell is accessed from each of the four 
windows in sub-block 1. An example is shown in Fig. [T]for the offset j = 2. This implies that 
if cells in sub-block 1 are organized in four independent memory units for each of the windows 
A, B, C and D then we do need to worry about memory contention, i.e., two or more processors 
in sub-block trying to simultaneously fetch data in the same memory unit in sub-block 1. 

In turbo coding applications, this property is also desirable in the reverse order, i.e., when sub- 
block and 1 switch roles. A mathematical description of the contention-free condition from [2] 
is now given. The exchange and processing of a sequence of N = MW extrinsic information 
symbols between sub-blocks of the iterative decoder can be parallelized by M processors working 
on window sizes of length W in each sub-block without contending for memory access provided 
that the following condition holds for both the interleaver f(x), < x < N and deinterleaver 
9(x) = f^(x): 

Ltt(j + tW)/W\ £ + vW)/W\ (1) 



February 1, 2008 



DRAFT 



4 



Device 



Offset j 




Sub-block 



Window 



Window 1 



Window 2 



Window 3 



X 







II --BB-'H 


. . 4 . 




■ n 1 


:■: 8 :■: 




lill 










H : : : : U : : : : : : : | 


f(x) 









4 


11 




8 




ii 2 ii 



i2-: : 


E|i — 3 !;e| 






OS 


12 


: ;3 : : 


111 6 ! 




EE 








1 


2 


3 



4 


5 


6 


7 



8 


9 


10 


11 



12 


13 


14 


15 



Window A 



Window B 



Window C 



Window D 



Fig. 1. Example of a contention-free property for W = 4, M = 4, and iV = 16. 



where < j <W, <t <v < N/W, and tt(-) is either /(•) or g(-). 

If an interleaver is contention-free for all window sizes W dividing the interleaver length N, it 
will be called a maximum contention-free interleaver. We show in this correspondence that per- 
mutation polynomials over integer rings always generate maximum contention-free interleavers. 

This correspondence is organized as follows. In section II, we review a result for quadratic 
permutation polynomials [1], [13] over the integer ring Z N and an elementary number theory 
proposition [19] needed for the main theorem. The main result is derived in section III, and 
examples and computer simulation results are given in section IV. Finally, conclusions are 
discussed in section V. 

II. Quadratic Permutation Polynomials over Integer Rings 

In this section, we establish notation, restate the criterion for existence of quadratic permutation 
polynomials over integer rings and, restate a result in number theory. The interested reader is 
referred to [1], [13] for further details. Given an integer N > 2, a polynomial 4 f(x) = fix + f 2 x 2 

4 lt can be shown that the exclusion of a constant coefficient fo in f(x) does not make this problem less general. 
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(mod N), where fi and f 2 are non-negative integers, is said to be a quadratic permutation 
polynomial over the ring of integers when f(x) permutes {0, 1,2,..., AT — 1} [1], [20]. 

In this correspondence, let the set of primes be V = {2, 3, 5, 7, . . .}. Then an integer iV can 
be factored as iV = Y[ P evP nN ' p > wnere n N,p > 1 for a finite number of p's and n^, p = 
otherwise. For example, if N = 3888 = 2 4 x 3 5 we have 723888,2 — 4 and 77,3888,3 = 5. For 
a quadratic polynomial f(x) = fix + f 2 x 2 (mod N), we will abuse the previous notation by 
writing f 2 = Y[ P evP nF ' F ^ 1>e -' me exponents of the prime factors of f 2 will be written as Uf, p 
instead of the more cumbersome rif 2tP because we will be mainly interested in the factorization 
of the second degree coefficient. 

Let us denote a divides b by a\b and by a \ b otherwise. The greatest common divisor of a and b 
is denoted by gcd(a, b). The necessary and sufficient condition for a quadratic polynomial f(x) 
to be a permutation polynomial is given in the following proposition. 

Proposition 1: [13] [1] Let N = Yl p evP nN ' P ■ The necessar Y an d sufficient condition for 
a quadratic polynomial f(x) = f\X + f 2 x 2 (mod N) to be a permutation polynomial can be 
divided into two cases. 

1) Either 2 \ N or 4\N (i.e., 77^,2 ^ 1) 

gcd(/i, N) = 1 and f 2 = Y[ peV p nF ' p , n FyP > 1, \/p such that n NrP > 1. 

2) 2|A^ and 4 \ N (i.e., n N , 2 = 1) 

h + fi is odd, gcd(/i, f ) = 1 and f 2 = H peV p nF >? , n F)P > 1, Mi such that p ^ 2 and 
n N , P > 1- 

How many permutation polynomials are there? For example, if the interleaver length is iV = 
256 then we determine from case 1) of Proposition [l] that fi 6 {1, 3, 5, ... , 255} (set of numbers 
relatively prime to N) and f 2 = {2, 4, 6, ... , 254} (set of numbers that contains 2 as a factor). 
This gives us 128 x 127 = 16256 possible pairs of coefficients fx and f 2 that make f(x) a 
permutation polynomial; if iV is a power of two then there are approximately iV 2 /4 possible 
pairs of coefficients. However, if iV is a prime number then there are no polynomials of the form 
f(x) for a non-zero f 2 . This may be perceived as a deficiency of the construction because certain 
interleaver lengths must be avoided. However, even restricting to powers of two gives plenty of 
possibilities and covers meaningful interleaver lengths. In general, the number of permutation 
polynomials is not a smooth function of N. 

Let us denote that x is congruent to y modulo iV by x = y (mod N); this means that there 
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exists an integer k such that x = y + kN. The following elementary number theory proposition 
is used for deriving the main theorem (Theorem [l]) of this correspondence. 

Proposition 2: [19] Let M be an integer. Suppose that M\N and that x = y (mod N). Then 
x = y (mod M). 

The proof follows by noting that x = y + kN = y + kWM, where W = N/M. 

III. Maximum Contention-Free Permutation Polynomials Interleavers 

The following defines a contention-free interleaver on its maximum extent. 

Definition 1: An interleaver is maximum contention-free (MCF) when the interleaver is contention- 
free for every window size W which is a factor of the interleaver length N. 

It is natural that the previous definition implies that we potentially have a degree of parallel 
processing of any soft-input soft-output algorithm by M = N/W processors, i.e., each factor of 
iV is a possible number of parallel processors. We show that quadratic permutation polynomials 
always generate interleavers that are MCF. 

Theorem 1: Let f(x) = f\X + f 2 x 2 (mod N) be a quadratic permutation polynomial. Then 
f(x) generates a MCF interleaver. 

Proof: We first verify condition © for f(x) and then for g(x) = f~ l (x). 

Let 



Qt 



fti + tW) 

w 



and Q l 



fU + vW ) 
w 



then 



f(j+tW) = Q t W+[f(j+tW) (mod W)} and f(j+vW) = Q v W+[f(j+vW) (mod W)\ 

We must show that Q t ^ Q v for t — v ^ (mod M) and any < j < W. 
Assume Q t = Q v . Then 



f(j + tW)-[f(j+tW) (modW)}-f(j+vW) + [f(j+vW) (modW)} _ n 

^t — ^v — ^ — U 

(2) 



February 1, 2008 



DRAFT 



7 



Using Proposition |2] and observing that 



f(j + tW) = f 1 j + hj 2 (mod WO and f(j + vW) = fa + f 2 f (mod W), (3) 

we conclude [f(j + tW) (mod W)\ = [f(j + vW) (mod W)} and therefore the absolute 
value of equation © can be simplified as 

\Q t -Q^ m + tW) w fU + VW)] =0 (4) 

By noting that (j + tW) ^ (j + vW) and that f(x) is a permutation polynomial, we conclude 
f(j + tW) 7^ f(j + vW) and we have a contradiction in ©. 

To verify condition ([TJ for the inverse polynomial g(x), we start by observing that permutation 
polynomials form a finite group G under function composition, i.e., f(f(x)) is a permutation 
polynomial and the inverse function can be found by a sufficient number of function compositions 
of f(x) to itself. In group theory parlance, f(x) generates the group G. It now suffices to show 
that every element in G, which includes the inverse function g(x), satisfies (UJ). This is easily 
shown by realizing that © implies f(x) permutes the set of indices A, = {j,j + W, j + 
2W, . . . ,j + (M — 1)1^}, i.e., indices belonging to every possible window at a particular offset 
j, becomes mapped by f(x) to the set of indices B k = {k,k + W,k + 2W, . . . , k + (M — l)W} 
where 



k = f l3 + f 2 f (mod WO- (5) 

We conclude © must be a permutation polynomial, otherwise f(x) would not be a permutation 
polynomial. 

Finally, one uses induction to find that every function obtained by successively composing 
f{x) (eventually generating the inverse function g(x)) generates a MCF interleaver. 

■ 

We can observe from the previous proof that there exist MCF interleavers generated by 
permutation polynomials of degrees other than two. In fact, we have the following Corollary. 

Corollary 1: Let f(x) = Ylf=ofi xl ( m od N) be a permutation polynomial of degree K. 
Then f(x) generates a MCF interleaver. 
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Proof: The proof is identical to as in Theorem [T] except that © is replaced by 

K K 

f(j + tW) = J2fd i (mod WO and f (j + V W) = fif (mod W), (6) 

i=0 i=0 

and © is replaced by 

K 

k = J2fif (mod WO . (7) 

i=0 

■ 

We could had stated Corollary [T] as the main theorem and Theorem [T] as a special case but we 
present them in this order because the emphasis in this correspondence is on quadratic permuta- 
tion polynomials. Naturally all linear interleavers [10] (also referred as circular interleavers [3], 
[11] and relatively prime (RP) interleavers [8]) are MCF. However, the error rate performance 
of turbo codes using linear interleavers are constrained by the linear interleaver asymptote [10]. 
The almost regular permutation (ARP) interleavers in [3] (closely related to linear interleavers 
and DRP interleavers) are mentioned to have a degree of parallel processing pC dividing N, 
where C is a design parameter also dividing N and p any integer. However, we believe many 
ARP interleavers (if not all) are MCF and therefore ARP interleavers are stronger with respect 
to the degree of parallel processing than what is stated in [3]. The advantage of our construction 
is a much simpler description of the interleaver by a single permutation polynomial, which we 
believe makes implementation simpler as well [14]. Moreover, the error performance is also not 
expected to degrade against any good interleavers as shown in the following section. 

IV. Examples and Computer Simulation Results 

We give four examples of interleavers generated by quadratic permutation polynomials in 
Table 0] The respective inverse functions are also given for completeness and were computed 
using the theory in [13]. Because their MCF property is guaranteed by Theorem [T] regardless of 
the choice of the permutation polynomials, we only need to select permutation polynomials that 
yield interleavers with good error rate performance for turbo codes. 

The interleavers in Examples 1-3 were found by a limited search for good polynomials 
using mainly the theory in [1], checking for the true minimum distance d mhl of the associated 
turbo codes using the algorithm in [21] (the algorithm only finished within a reasonable amount 
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TABLE I 

Examples of MCF interleavers 



Example 


N 


fix) 


9{x) 


D 


^min 


1 


256 


159x + 64a; 2 (mod N) 


95a; + 64a; 2 (mod N) 


16 


27 


2 


1024 


Six + 64a; 2 (mod AT) 


991a; + 64a; 2 (mod N) 


32 


27 


3 


4096 


2113a; + 128a; 2 (mod N) 


4033a; + 1920a. 2 (mod N) 


64 




4 


15120 


llx + 2Wx 2 (mod N) 


14891a; + 210a; 2 (mod N) 


20 





of time for Examples 1 and 2), and finally running computer simulations. To the best of our 
knowledge, one of the most accepted indicators for a good interleaver with respect to error 
performance for parallel concatenated turbo codes is the spread factor [11], [22] defined as 

D= min {\i-j\ + \f^-f(j)\}. (8) 

i,j6{0,l,...,JV-l} 

The upper bound on the spread factor was proved in [23] to be \/2N and was shown earlier [11] 
to be achievable or closely approximated with carefully chosen linear interleavers. The error rate 
performance of turbo codes using any linear interleaver is constrained by the linear interleaver 
asymptote [10]. Therefore, the maximization of the spread factor alone is not sufficient to 
guarantee a good error performance. Nevertheless, the spread factors D are computed for our 
examples as a point of reference because many good constructions attempt a maximization of the 
spread factor. The spread factors obtained for Examples 1-3 (i.e., the codes simulated for this 
correspondence) are approximately 70% of the upper bound V2N independent of the degree of 
parallel processing because we use a fixed interleaver. Interestingly, the spread factors obtained 
in [15] are also close to 70% of \/2N when the degree of parallel processing is M — 1 (serial 
processing) and with some small decrease as the degree of parallel processing increases; the 
interleavers therein found are all different for each degree of parallel processing and the search 
algorithm is designed to maximize the spread factor. 

The interleaver in Example 4, chosen by the Jet Propulsion Laboratory, is being considered 
in [14] because of its excellent performance and ease of implementation. Example 4 is also 
interesting because N = 2 4 • 3 3 • 5 ■ 7 is composed of several different prime factors whereas for 
Examples 1 - 3, the interleaver lengths are powers of two. 
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In all of the four examples, 16 is a factor of the interleaver length N. This means that we can 
have a sub-block of an iterative decoder split into 16 parallel sections without causing memory 
access contention when exchanging extrinsic information with other sub-blocks. 

We now demonstrate that the restriction of an interleaver generated by a quadratic polynomial 
to be MCF does not degrade the associated turbo code error performance. On the contrary, 
the quadratic interleavers generate turbo codes that have excellent error rate performance. The 
simulated turbo codes are of nominal rate 1/3 for the 3rd Generation Partnership Project (3GPP) 
standard [24] but using MCF interleavers generated by quadratic polynomials in Examples 1-3. 
We use BPSK modulation and assume an additive white Gaussian noise (AWGN) channel. The 
frame error rate (FER) performance curves are shown in Fig.|2j We used eight log-MAP decoding 
iterations and simulated until at least 100 frame errors had been counted. The typical benchmark 
S-random interleavers [11] were also simulated under the same conditions. In addition, the current 
3GPP standard curves are plotted. 5 Additional reference curves are available in [2]. 

It is observed from Fig. El that the FER performance curves of turbo codes using the quadratic 
permutation polynomial interleavers meet 6 or exceed the performances of S-random interleavers 
down to an FER of at least 1CT 4 . Moreover, from the slope of the curves, we again expect to 
meet or exceed the error performance against any other interleaver down to an FER of at least 
10" 4 . 

V. Conclusion 

Nimbalker et al. proved that only a very small fraction of all interleavers are contention- 
free [2]. Therefore we have shown the remarkable fact that all permutation polynomials over 
integer rings generate MCF interleavers. This property is exceptionally important for a high- 
speed hardware implementation of iterative turbo decoders because it means a potential parallel 
processing of iterative decoding of turbo codes by M processors for any positive integer M 
dividing the interleaver length N. Conversely, if one has a target of using M processors, then it 
suffices to choose an interleaver length N which is a multiple of M. We have given examples 
of interleavers based on quadratic polynomials that are MCF. These interleavers generate turbo 

5 The curve was obtained from [2] but adjusted for any termination bits rate-loss as it was done in our curves. The curves 
therein had been simulated with eight decoding iterations and until at least 50 frame errors had been counted. 

6 The length 4096 quadratic polynomial curve is slightly worse for high FER's compared with the S-random interleaver. 
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Fig. 2. FER curves comparing 3GPP interleavers, S-random interleavers, and our MCF quadratic polynomial interleavers. 



codes with error rate performances that are expected to meet or exceed any known interleavers 
for the 3GPP standard down to a frame error rate of at least 1CT 4 . Moreover, MCF interleavers 
based on quadratic permutation polynomials have virtually the simplest generation algorithm and 
the least number of input parameters among all known interleavers, which implies their very 
simple implementation in software or hardware and little memory requirements. 
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